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ON THE Lp-ERROR OF MONOTONICITY 
CONSTRAINED ESTIMATORS 

By Cecile Durot 
Universite Paris Sud 

We aim at estimating a function A : [0, 1] — > R, subject to the 
constraint that it is decreasing (or increasing). We provide a uni- 
fied approach for studying the L p -loss of an estimator defined as the 
slope of a concave (or convex) approximation of an estimator of a 
primitive of A, based on n observations. Our main task is to prove 
that the Lp-loss is asymptotically Gaussian with explicit (though un- 
known) asymptotic mean and variance. We also prove that the local 
Lp-risk at a fixed point and the global L p -risk are of order n _P//3 . Ap- 
plying the results to the density and regression models, we recover 
and generalize known results about Grenander and Brunk estima- 
tors. Also, we obtain new results for the Huang-Wellner estimator of 
a monotone failure rate in the random censorship model, and for an 
estimator of the monotone intensity function of an inhomogeneous 
Poisson process. 

1. Introduction. A frequently encountered problem in nonparametric 
statistics is to estimate a monotone function A on a compact interval, say, 
[0, 1]. Grenander [5], Brunk [2] and Huang and Wellner [9] propose estimators 
defined as the slope of a concave (or convex) approximation of an estimator 
of a primitive of A, in the cases where A is a monotone density function, a 
monotone regression mean and a monotone failure rate, respectively. These 
estimators have aroused great interest since they are nonparametric, data 
driven (they do not require the choice of a smoothing parameter) and easy 
to implement using, for example, the pool adjacent violators algorithm; see 
[1]. Moreover, Reboul [14] provides nonasymptotic control of their Li-risk, 
which proves that they are optimal in some sense. From an asymptotic point 
of view, Prakasa Rao [13], Brunk [3] and Huang and Wellner [9] prove cube- 
root convergence of these estimators at a fixed point and obtain the pointwise 
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asymptotic distribution; Groeneboom, Hooghiemstra and Lopuhaa [8] and 
Durot [4] prove a central limit theorem for the Li-error of the Grenander 
and Brunk estimators, respectively, and Kulikov and Lopuhaa [11] generalize 
the result in [8] to the L p -error of the Grenander estimator. 

In this paper we consider the problem of estimating a monotone function 
A : [0, 1] — > M in a general model. We provide a unified approach for studying 
the Lp-error of estimators defined as the slope of a concave (or convex) 
approximation of an estimator of a primitive of A. We prove that, at a point 
that may depend on the number n of observations and is far enough from 
and 1, the local L p -risk is of order n _p / 3 . We also provide control of the 
local Lp-risk near the boundaries and derive the result that the global L p - 
risk is of order n~ p ^ . Our main result is a central limit theorem for the L p - 
error; see Theorem 2: we prove that the L p -error is asymptotically Gaussian 
with explicit (though unknown) asymptotic mean and variance. Applying 
the results to the regression and density models, we recover the results of 
[4, 8, 11] about Brunk and Grenander estimators. Also, we obtain new results 
for the Huang-Wellner estimator in the random censorship model, and for 
an estimator of a monotone intensity function based on n independent copies 
of an inhomogeneous Poisson process. We believe that our method applies 
to other models. 

Our main motivation for proving asymptotic normality of the L p -error 
relies on goodness-of-fit tests. Assume indeed we wish to test Hq : A = Ao for 
a given decreasing (resp. increasing) Ao, against the nonparametric alterna- 
tive that A is decreasing (resp. increasing). Using asymptotic normality and 
proper estimators for the asymptotic mean and variance, we can draw from 
the observations a normalization of the L p -distance between A n and Ao that 
converges under Hq to the standard Gaussian law. The test that rejects Hq 
if this normalization exceeds the (1 — a)-quantile of the standard Gaussian 
law has asymptotic level a. With additional effort, Theorem 2 can also be 
used to test a composite null hypothesis. This will be detailed elsewhere. 

The paper is organized as follows. In Section 2 we define and study our 
estimator in a general model. In Section 3 we apply the results of Section 
2 to the random censorship, inhomogeneous Poisson process, regression and 
density models. The results of Section 2 are proved in Sections 4 and 5 and 
the results of Section 3 are proved in Section 6. 

2. Main results. We aim at estimating a function A : [0, 1] — > R subject 
to the constraint that it is nonincreasing (or nondecreasing) , on the basis 
of n observations. Assume we have at hand a cadlag (i.e., right continuous 
with left-hand limits at every point) step estimator A n of 





We define the monotone estimator of A as follows: 
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Definition 1. Let A n : [0, 1] — > M be a cadlag step process. If A is non- 
increasing (resp. nondecreasing) , then the monotone estimator X n based on 
A n is defined as the left-hand slope of the least concave majorant (resp. 
greatest convex minorant) of A n , with A„(0) = lim^m A n (i). 

Thus, the monotone estimator is a step process that can jump only at the 
jump points of A n ; it is monotone and left-continuous. 

Hereafter, M n denotes the process defined on [0, 1] by M n = A n — A. We 
make the following assumptions. 

(Al) A is monotone and differentiable on [0,1] with inff |A'(t)| > and 
sup t |A'(t)| < co. 

(A2) There exists C > such that, for all x > ra -1 / 3 and t G [0, 1], 



sup (M n (u) - M n (t)f 



„ Cx 
~ n 



(1) E 

Lue[0,l], x/2<\t-u\<x 

(A2') Inequality (1) holds for all x > and t G {0, 1}. 

First, we give a control of the local L p -risk of A n at a time t that is allowed 
to depend on n: it is of order ra _p / 3 if t is far enough from and 1 [in 
particular, if t 6 (0, 1) does not depend on n\. We obtain a control of larger 
order if t is near a boundary and derive a control of the global L p -risk: 

Theorem 1. Assume (Al), (A2), (A2') and let p G [1,2). Then there 
exists K > 0, which depends only on A, C and p, such that 

E\X n (t) - X(t)\ p < Kn~ p / 3 

for all t e [n -1 / 3 , 1 — n -1 / 3 ], and 

(2) E|A n (t) - X(t)\P < K[n(t A (1 - t))]~ p/2 
for all t G (0, n" 1 / 3 ] U [1 - n" 1 / 3 , 1) . 

Corollary 1. Assume (Al), (A2), (A2') and let p G [1,2). Then 



E 



\X n (t) - X(t)\ p dt 



0(n- p / 3 ). 



Note that Theorem 1 does not provide a control of the risk at t G {0, 1}. In 
fact, it is known that the monotone estimator is not consistent at the points 
and 1 in particular models; see [17] for the density model. To control the 
error at the boundaries, we assume the following. 

(A3) A n (0) and A n (l) are stochastically bounded. 
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The following lemma provides a sufficient condition for (A3), which will be 
useful for applications. 

Lemma 1. Assume (Al), (A2) and (A2'). If for every e > there exists 
5 > such that the probability that A n jumps in (0,6/n) or in (1 — 5/n, 1) 
is less than e, then (A3) holds. 

Proof. Let x, 5 and e be fixed positive numbers. One has 

P(|A n (0)| > x) < F(\\ n (5/n)\ > x) + P(A n (0) ^ X n (5/n)). 

From Theorem 1, X n (5/n) is stochastically bounded. Moreover, A n (0) can 
differ from X n (5/n) only if A n jumps in (0,5/n). Hence, both probabilities 
in the above upper bound are less than e, provided 5 is small enough and x 
is large enough, whence A n (0) = Op(l). Likewise, A n (l) = Op(l). □ 

To compute the asymptotic distribution of the L p -error, we assume that 
M n can be approximated in distribution by a Gaussian process. Specifically, 
we assume the following. 

(A4) Let B n be either a Brownian bridge or a Brownian motion. There 
exist q > 12, C q > 0, L : [0, 1] — > R and versions of M n and B n such that 

pfn 1 " 1 ^ sup \M n {t)-n- 1/2 B n oL{t)\ > x] <C q x~ q 
V te[o,i] / 

for all x € (0,n]. Moreover, L is increasing and twice differentiable on [0, 1] 
with sup t \L"(t)\ < oo and mf t L'(t) > 0. 

We also need to define the following process X: 

(3) X(a) =argmax{-(u-a) 2 + W(u)}, a G R, 

where W is a standard two-sided Brownian motion (see [6, 7] for a precise 
description of this process). It is known that, for every p > 0, E|A(0)| p is 
finite and the following number k p is well defined and finite: 

/•OO 

k= cov(|A(0)| p ,|A(a) -a\ p )da. 
Jo 

We are now in position to state our main result. 

Theorem 2. Assume (Al), (A2'), (A3) and (A4). Assume, moreover, 
there are C > and s > 3/4 with 

(4) \X'(t) -X'(x)\ <C'\t-x\ s for all t,xE [0,1]. 
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Letpe [1,5/2). Then with m p = E\X(0)\ p \AX'{t)L'(t)\^ dt, 
ra 1 / 6 (n p / 3 J \X n {t) - \{t) \ p dt - m^j 

converges in distribution as n — > oo to the Gaussian law with mean zero and 
variance a 2 p = 8k p ^ |4A \t)V \t)\ 2 ^~ 1 ^ 3 V \t) dt. 

Note that our proof of Theorem 2 is partly inspired by [4, 8, 11]. As 
in those papers, a key step consists in proving that the L p -error of A n is 
asymptotically equivalent to an L p -error of U n , the inverse process of X n . 
In the present approach, the proof is quite simple (even for p > 1) thanks 
to the use of Theorem 1. Another key step consists in approximating a 
proper normalization of U n (a) by the location of the maximum of a drifted 
Brownian motion. In the present approach, thanks to Proposition 1 in [4], 
we deal with a parabolic drift independent of n, whereas in [8] and [11] 
the considered drift depends on n and is only close to parabolic (which 
brings about technicalities, e.g., in the computation of asymptotic moments). 
Finally, asymptotic normality is proved using Bernstein's method of big 
blocks and small blocks, as in [8] and [11]. 

Let us comment on the assumptions in Theorem 2. On one hand, the 
contribution of the boundaries of the L p -error is not negligible for p > 5/2 
because A n converges slowly to A near and 1 (this was already stressed 
for the density model in [11]). This is the reason why we restrict ourself to 
p < 5/2. On the other hand, our proof of Theorem 2 relies on Proposition 
1 of [4], which provides a control of the error we make when we approxi- 
mate the location of the maximum of a given process by that of a drifted 
Brownian motion. The assumptions q > 12 and s > 3/4 emerge when using 
this proposition; see Lemma 5 below. We believe that the proposition can 
be improved with the assumptions q > 12 and s > 3/4 being weakened. 

To conclude this section, we comment on a slight modification of A n . Let 
C n be the set consisting of 0, 1 and the jump points of A n , and let C n be 
the "cumulative sum diagram" consisting of the points (t, A re (t)), t € C n . If 
A is nonincreasing (resp. nondecreasing) , let A n be the left-hand slope of the 
least concave majorant (resp. greatest convex minorant) of C n . Then A n and 
A n are identical if A n is nondecreasing and A is nonincreasing, but they may 
differ otherwise. In some applications, A n may be preferred to A n since, for 
instance, the least-squares estimator of a monotone regression mean takes 
the form A n . Therefore, we now describe the asymptotic behavior of A^. 
Let A n be the continuous piecewise-affine version of A n , which means that 
A n (i) = A n (t) at every t &C n , and A n is affine in between two consecutive 
such points. Assume 

(5) sup E[(A n (t) - A n (t)f] < Cn- 4 / 3 

te[o,i] 
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for some C > 0. Then Theorem 1 and Corollary 1 remain true with A, 
replaced by A n . On the other hand, assume 



■n 



(6) 



E sup \A n {t) - A n {t)\ q <Cn 
-te[o,i] 
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for some q > 12 and C > 0. Then Theorem 2 remains true with A n replaced 
by A n . The proof of these results is omitted. It is worth noticing that the 
extra assumptions (5) and (6) hold in every application we consider in Sec- 
tion 3. 

3. Applications. In this section we consider several models where it may 
be interesting to estimate a function A on [0, 1] subject to a monotonicity 
constraint. In each model we propose an estimator A n of A, we give suf- 
ficient conditions for the assumptions (A2), (A2'), (A3) and (A4), and we 
make explicit the function L in (A4). In particular, this provides sufficient 
conditions for the L p -error of the monotone estimator to be asymptotically 
Gaussian with explicit asymptotic mean and variance. It is worth noticing 
that, in each considered application, (A2) and (A2') follow from Doob's in- 
equality and the fact that a proper modification of M n is a martingale. Also, 
(A4) follows from an embedding argument similar to that of Komlos, Major 
and Tusnady [10]. 

3.1. The random censorship model. Assume we observe a right-censored 
sample (X 1} Si), (X n ,S n ). Here, X, = min(Tj, Yj) and <5j = t Ti < Yi , where 
the Tj's are nonnegative i.i.d. failure times and the are i.i.d. censoring 
times independent of the Tj's. Assume that the common distribution func- 
tion F of the Tj's is absolutely continuous with density function / and that 
we aim at estimating the failure rate A = //(l — F) on [0, 1]. Let N n be the 
Nelson-Aalen estimator, defined as follows: if t\ < ■ ■ ■ < tj~ are the distinct 
times when we observe uncensored data and nj is the number of Xj that are 
greater than or equal to tj, then N n is constant on each [tj,ij + i) with 



Moreover, N n (t) = for all t < h and N n (t) = N n (t k ) for all t > t k . Let A n 
be the restriction of N n to [0, 1] and G be the common distribution function 
of the Yj's. The monotone estimator based on A n is the Huang- Wellner 
estimator and we have the following. 



i 




Theorem 3. Assume (Al), F(l) < 1 and lim t -|-i < 1. 
(i) Then (A2), (A2') and (A3) hold. 
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(ii) Assume, moreover, inf tg r ^ A(i) > and G has a bounded continuous 
first derivative on (0,1). Then (A4) holds with 

(7) L(t) = f - X ^ du, t€[0,l]. 

Jo (1 - F{u))(l - G[u)) 

Note that in the case of nonrandom censoring times 1$ = 1, one has G(u) = 
for all u < 1, so L reduces to L = (1 — i 7 )" 1 — 1. 

3.2. TTie Poisson process model. Assume we observe i.i.d. inhomoge- 
neous Poisson processes N±, . . . , N n , and their common mean function A is 
differentiable on [0, 1] with derivative A. Let A n be the restriction of Y^,i Ni/n 
to [0, 1]. Then we have the following. 

Theorem 4. Assume (Al), A(l) < oo and inf tg r 0)1 i X(t) > 0. Then (A2), 
(A2'), (A3) and (A4) hold with L = A. 

3.3. T/ie regression model. Assume we observe yi >n = A(i/n) + e^, i = 
1, . . . , n, where the independent random variables with mean zero. 
Let 

An(t) = ^E^' *e[o,i]. 

Then the monotone estimator based on A n is (a slight modification of) the 
Brunk estimator and we have the following. 

Theorem 5. Assume (Al) and supj n E|ej in | IJ < c q for some q>2 and 
c q >0. 

(i) Then (A2), (A2') and (A3) hold. 

(ii) Assume, moreover, q > 12 and var(ej )n ) = a 2 (i/n) for some a 2 : [0, 1] - 
K+. If a 2 has a bounded first derivative and satisfies inL<7 2 (i) > 0, then (A4) 
/10/ds wt/j L{t) = Jq o~ 2 (u) du. 

In particular, if the e^nS are i.i.d. with a finite moment of order q > 12 and 
variance cr 2 > 0, then L reduces to L(t) = to 2 . Thus, we recover Theorems 
1 and 2 of [4]. 

3.4. The density model. Assume we observe independent random vari- 
ables X\,... ,X n S [0, 1] with common distribution function A and density 
function A = A'. Then, the monotone estimator based on the empirical dis- 
tribution function of X\ , . . . , X n is the Grenander estimator and we have the 
following. 
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Theorem 6. Assume (Al) and mf t X(t) > 0. Then (A2), (A2'), (A3) 
and (A4) hold with L = A. 

In particular, we recover Theorem 1.1 of [8] and Theorem 1.1 of [11]. 

4. Proof of Theorem 1. We assume here A is decreasing. The similar 
proof in the increasing case is omitted. We denote by K or K' (resp. c) a 
positive number that depends only on A, C and p and that can be chosen 
as large (resp. small) as we wish. The same letter may denote different 
constants in different formulas. 

First, we give upper bounds for the tail probabilities of the inverse process. 
Recall that for every nonincreasing left-continuous function h : [0, 1] — > R, 
the (generalized) inverse of h is defined as follows: for every a € R, h~ 1 (a) 
is the greatest t € [0, 1] that satisfies h(t) > a, with the convention that the 
supremum of an empty set is zero. Let A+ be the upper version of A n defined 
as follows: A+(0) = A n (0) and for every t G (0, 1], 



A+ (f ) = max] A n (t), lim A n (u)\. 



Setting U n = (A n ) , one can check that 

(8) U n (a) = argmax{A^(n) — au} for all a € R, 

u€[0,l] 

where argmax denotes the greatest location of the maximum (which is 
achieved). Moreover, for any o£K and t G (0, 1], one has U n (a) > t if and 
only if a < \ n (t). Hereafter, g = A . 

Lemma 2. There exists K > such that, for every a£R and x > 0, 

(9) F[\t7 n (a)-g(a)\>x]<^. 

nx^ 

Proof. Fix a e R, x > n -1 ' 3 and denote by P x the probability in (9). 
By (8), we can have \U n (a) — g(a)\ > x/2 only if there exists u £ [0, 1] with 
\u — g(a)\ > x/2 and A+(-u) — au > A+(g(a)) — ag(a), whence 



P,< 



sup {A+(u) -au} > A+(g(a)) - ag(a) 

\u-g(a)\>x/2 



But A n is cadlag and A+ > A n , so the previous inequality remains true with 
A+ replaced by A n . Let c satisfy < c < inf t |A'(t)|/2. If X(g(a)) / a, then 
either a > X(g(a)) and g(a) = 0, or a < X(g(a)) and g(a) = 1. Hence, from 
Taylor's expansion, 

(10) A(u) - A(g(a)) <(u- g(a))a - c(u - g{a)f 
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for all u 6 [0, 1], whence 



sup {M n (u) - M n {g{a)) - c(u - g(a)) 2 } > 

u-g{a)\>x/2 



(11) Px < 

It then follows from Markov's inequality and (A2) that 



fc>0 



sup {M n (u) - M n (g(a))} > c(x2 

\u-g(a)\€[x2 k - 1 ,x2 k ] 

x2 k /n 



k-l\2 



But X]fc2 _3fc is finite so (9) holds for all x > n -1 / 3 . This inequality clearly 
extends to all x > since the upper bound is greater than one for all x < 
n^ 1 / 3 , provided K > 1. □ 



(12) 



Lemma 3. There exists K > such that, for every x > and a ^ A([0, 1]), 

K 



\U n (a)-g{a)\>x]< 



nx(X(g(a)) - a) 



2 ' 



Proof. We argue as above except that we use (A2') instead of (A2), 
and instead of (10), we use the fact that A(u) — A(g(a)) < (u — g(a))X(g(a)). 
□ 

Now we prove Theorem 1. Let t £ (0, 1). By the Fubini theorem, 

/■CO 

h := E[(A n (t) - X(t)),] p = / P[A n (t) - X(t) > xjpx^ 1 dx, 

Jo 

where for all x € M, x+ = max(x,0). We have U n (X(t) + x) > t whenever 
X n (t) > X(t) + x, whence 

roo 

h< / F[U n (X(t) +x) >t]px p ~ x dx. 
Jo 

By (Al), there exists c > such that g(X(t) + x) <t — cx for every number 
x that satisfies X(t) +i€ (A(l), A(0)). As a probability is no more than one, 
it thus follows from Lemma 2 that 



roo 

h < Kn~ p/3 + / PR7n(A(t) + x) > tjpxP' 1 

Jx(o)-\(t) 



dx. 



(13) 

/A(0)-A(t) 

One has g(X(t) + x) = for all x > A(0) — X(t), so Lemma 2 yields 
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Assume t > n 1 / 3 . Combining this with (12) yields 

h < Kn- p / 3 + / — j— — r — r^px^ 1 dx. 

^A(0)-A(t)+n-V3 nt{X(0) - X(t) - %Y 

As p < 2, we obtain 1\ < Kn~ p l 3 . Now assume t < n -1 / 3 . Then n -1 / 3 < 
(jit)^ 1 / 2 . A probability is no more than one, so (13) and (12) yield 



PC 

h < K(nt)~ p / 2 + / 
J\( 



px p 1 dx. 



/A(0)-A(t)+(ni)- 1 /2 nt(X(0) - X(t) - x) 2 
As p < 2, we obtain I\ < K(nt)~ p / 2 . In both cases, 

h <K{n~ p / 3 + {nt)- p > 2 ). 

Similarly, 

h ■= n(X(t) - X n (t)) + ] p < K{n- p /' 3 + („(1 - t))- p/2 ) 
and the result follows. 

5. Proof of Theorem 2. We assume here A is decreasing. The similar 
proof in the increasing case is omitted. We denote by K or K' (resp. c) 
a positive number that depends only on A, C, p, C q , q, L, and that can 
be chosen as large (resp. small) as we wish. The same letter may denote 
different constants in different formulas. Moreover, we denote by U n the 
inverse process (8) and we set g = A" 1 . We first provide in Lemma 4 an 
upper bound for the tail probability of U n , which is sharper than (9). Then, 
thanks to Proposition 1 in [4], we prove two lemmas that will be useful to 
approximate a properly normalized version of U n { a ) with the location of the 
maximum of a drifted Brownian motion. Finally, we prove Theorem 2. 

Lemma 4. There exists K > such that, for every a£K and x > 0, 
(14) P[\U n (a) - g(a)\ >x] <K{nx 3 ) 1 ~ q . 

Proof. Fix a G R, x € (0, 1] and denote by P x the probability in (14). 
From (11), one has P x < P% + P", where P' x is equal to 

sup ln- 1/2 (B n o L(u) - B n o L{g(a))) -%(u- g(a)f) > o) 

|u-s(a)|>as/2 I J / 



and 



2 

CX 



f£ = P sup \M n (u)-n- l l 2 B n oL{u)\> — ). 
W[o,i] lb 
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One can derive from the properties of Brownian motion and the Brownian 
bridge (see, e.g., (24) below and the proof of Theorem 4 in [4]) that, for all 

xe(0, l], 

P' x <Kexp(-cnx 3 ) < ^(nx 3 ) 1 ""- 

Now by (A4), there exists K > with 

P'J. < Kx~ 2q n l - q < K{nx z ) 1 ~ q . 

Hence, (14) holds for all x € (0, 1]. It clearly extends to all x > since both 
U n (a) and g(a) belong to [0, 1]. □ 

Lemma 5. Let T n > 0, W n be a standard two-sided Brownian motion, 
D n : [—T n , T n ] -*la nonrandom function and R n a process indexed by [ — T n , T n ] . 
Furthermore, let 

U n = argmax{D n + W n + R n ] and V n = argmax {D n + W n }. 
[-T n ,T n ] [-logn,logn] 

Assume D n continuously differentiable, D n (0) = and there exist positive 
A and c such that \D' n (u) \ < A\u\ and D n (u) < —cu for all u € [— T n ,T n ]. 
Assume, moreover, either (i) or (h), where: 

(i) T n = n 1 ^ 3 ^ 69-11 ^ for some q > 12 and there exists K > such that 



(15) P 



sup \R n (u) \ > x 

.ixe[-T„,T n ] 



< Kx~ q n 1 ~ q/ ' i for all x 6 (0, n 2/3 } . 



(ii) T n = logn and there exist K > and s > 3/4 with 
sup \R n {u)\ <Kn- s/3 {lognf. 

ue[-T n ,T n ] 

Let r = 2(q — l)/(2q — 3) under (i) and r < 2s under (ii). Then there exists 
K' > that depends only on K, A, c and r such that 

nu n -v n \ r <k'(- — . 

Proof. Assume (i). Assume, moreover, n is large enough so that T n > 
logn. If denotes the greatest location of the maximum of D n + W n on 
[— T n ,T n ], then V n can differ from only if \V^\ > logn. It thus follows from 
Proposition 1 in [4] (see also the comments just above this proposition) 
that there exists an absolute constant C such that the probability that 
I U n — V n I > 5 is no more than 



xS 3 / 2 

sup \R n (u)\>— — 

■u£[-T n ,T n ] 1 



+ Cx logn + 2P(|T^| > logn) 
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for every (x,5) that satisfies 

(16) * 6 (0,logn],,>0, ^o g nf< 25i ^ i/2x5) . 
Moreover, for every x > 0, 

(17) P(|V^| >x) <2exp(-cV/2); 

see, for example, Theorem 4 in [4]. Let e > and for every 5 > 0, set 

x s = (logn)- 1 /(9+i) 5 - 3 «/( 2 («+ 1 ))n( 3 -«)/( 3 ^+ 1 )). 

Then (16) holds for every (6,x s ) with (5 e (n -1 / 6 /log n,n £ ), provided n is 
large enough. By (15), there thus exists K' > such that, for every such 5, 

(18) P(\U n -V n \>5)<K'xslogn. 
Now, |£/„ — V n \ < 2T n , so from Fubini's theorem 

/•2T„ 

e i/n - v n \ r = / p(|c/ n - v n \ > sys 7 - 1 d5. 

Jo 

But for every 5 > n~ £ , \U n — V n \ can exceed 5 only if it exceeds n~ £ and 
therefore, the above integral is no more than 



71 



-1/6 \ r 



+ K'logn / xsrd 1 " 1 dd + K'x n -elogn(2T n Y . 

logn/ Jn-Ve/iogn 

Since r < 3q/(2(q + 1)), straightforward computations prove that this is of 
order 0((n~ 1 / 6 /logn) T "), provided g > 12 and e is small enough. This com- 
pletes the proof in the case (i). 
Assume (ii). For every 5 > 0, let 

x s = 2K5~ 3/2 n- s/3 {lognf . 

Arguing as above, we get (18) for every 5 £ (n _1 / 6 /logn,n _e ). We conclude 
with the same arguments, since s > 3/4 and r < 2s. □ 

Lemma 6. Let U n and V n be processes indexed by J n C [xo,xi] for some 
real numbers xq and x\ independent of n. Let p>l, r > 1 and let r' satisfy 
1/r = 1 — 1/r'. Assume there are q' and K such that 

(19) supE\U n (a)\ q ' <K and sup E\V n (a)\ q ' < K 

for all n. Assume, moreover, either (i) or (ii), where: 

(i) q' = (p-l)r' and S u Va&Jn nU n (a) -V n (a)Y = o{n~ r / & ). 

(ii) q' = pr' and there exist 7 > r /6 and K 1 > such that, for every n 
and ae,J n , f(U n (a) ^ V n (a)) < K'n^'. 
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Then 

f \U n (a)\ p da= f \V n (a)¥da + oe{n- x l*). 

J tin J Jn 

Proof. It follows from Taylor's expansion that 

(20) \x p - y p \ <p\x - y\(xV yf" 1 <p\x - y\(x p ~ l + y p ~ l ) 
for all positive numbers x and y. Hence, for every a € J n , 

E\\U n (a)\ p - \V n (a)\ p \<pE[\U n (a) - V n (a)\{\U n (a)\ p - 1 + ^(a)!*" 1 )]. 
Also, 

E\\U n {a)\ p - \V n (a)\ p \ < E[t Un(a) ^ Vn(a) {\U n {a)\ p + \V n (a)\ p )]. 
Hence, the result follows from Holder's inequality. □ 

Now we turn to the proof of the theorem. Hereafter, 

Jn = n p / 3 f 1 \X n (t) - X(t)\ p dt. 
Jo 

• Step 1. First we express J n in terms of U n . Precisely, we prove 

(21) J n = n p / 3 f m \U n {a) - g{a)\ p \g'{a)\ l - p da + o ¥ (n^ 6 ). 

J\(i) 

For every iGl, let x+ = max(x, 0). Moreover, let 

h= j\(X n (t)-X(t)) + ] P dt, I 2 = (\{X(t)-X n {t)) + } P dt 

Jo Jo 

and 

•1 KA(0)-A(t))f 

Jo Jo 

We have X n {t) < A(0) for all t > U n (X(0)), so 



J i-/ / 1 X n (t)>x(t)+a 1 /p dadt - 



rU n (\(0)) roo 

< 7i — Ji = / / 1; 1/v dadt 

~ JO i(A(0)-A(t))P ^(t)>\(t)+al/P 

< / [(x n (t)-x(t)),fdt. 





Hence, by monotonicity 

rn — 1/3 J Q g n 



h-Ji< (X n (t) - X(t)) p + dt + |A n (0) - A(l)| p l„ 1/3f > 4A{0))>log 
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Let p' €E (p — 1/2, 2) be such that I <p' <p (such ap' exists since p£ [1,5/2)). 
By assumption, A is bounded and A n (0) is stochastically bounded, so, from 
Lemma 4, 

n — 1/3 l g n 

/i - Ji < |An(0) - A(l)|P"f / |A n (t) - X(t)\P dt + opin-P/ 3 - 1 / 6 ). 



Now, note that the results in Theorem 1 remain true under the assumptions 
of Theorem 2 (since one can use Lemma 4 instead of Lemma 2 in the proof). 
As p' € [1,2), we get 

^, n -l/3 J g n 

El / \XJt) - XitW dt I < Kn~^ p >^\ozn. 



\X n (t)- X(t)\ p 'd?j <Kn-( 1+p ')/ 3 lo gi 



But p' >p - 1/2, so 

_1 '' 3 logn A 

|A n (t)-A(t)| p di = o P (n-P/ 3 - 1 / 6 ). 

Therefore, ii = Ji + op(n _p / 3-1 / 6 ). The change of variable b = X(t) + a l / p 
then yields 

/■A(O) /■[/„(» 

By Taylor's expansion, (Al) and (4), there exists K > such that 

(22) | [b - A(t)r 1 - [(^(6) - t) A' o 5 (&)f - 1 \<K(t- g(b)) p - 1+s 

for all 6 E (A(l), A(0)) and i € (<?(&), 1). As a probability is no more than one, 
integrating (14) proves that, for every q' < 3(q — 1), there exists K q > > with 

(23) E[(n 1/3 \U n (a) - g{a)\) q '} < K q , for all oGi. 
Thus, 1 1 equals 

A(0) /■£>„(&) 
A(l) Jg(b) 

where R = O v {n~ { -P +s ^ 3 ). Hence, 

A(0) 



p(t - gib))*- 1 ^ o g(b) r\ {b)<(}n{b) dtdb + R + o P (n-^ 3 - 1 /6) 



A = / " 9{b)\ p \X' o g{b)r\ {h)<Ub) db + oHn-^ 3 - 1 ^ 

JX(1) 



Likewise, 

/■•Mo) 

h = / \ 9 (b) - U n (b)\ p \X' o g(b)r \ {h)>tjn{b) db + o P (n- p/3 - 1/e ) 



and the result follows, since J n = n p l 3 {J\ + I2). 
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• Step 2. Now we approximate a proper normalization of U n by V, denned 
as follows. We have the representation 



(24) 



B n (t) = W n (t)-Cnt, 



where W n is a standard Brownian motion, £ n = if B n is a Brownian motion 
and £ n is a standard Gaussian variable independent of B n if B n is a Brownian 
bridge. Let d = |A'j/2(L') 2 , and for every t 6 [0, 1] let 



(25) 



W t (u) = n 1 / 6 [W n (L(t) +n- l l' i u) -W n (L(t))}, 



so that Wt is a standard Brownian motion. For every t € [0, 1] , we define 
V{t) as the location of the maximum of the drifted Brownian motion u i— > 
—d(t)u 2 + Wt(u) over [— log n, log n]. We aim at proving 



(26) 



tin 



V{t)-n~ l l Q 



2d(t) 



X'(t) 



L>(t) 



dt + op(n" 



-l/6> 



For every a 6 M, let = a — n~ l / 2 ^ n L' '(g(a)). The process C/ n is nonincreas- 
ing and |£ n | is less than logn with probability greater than 1 — exp(— (logn) 2 /2). 
As V is bounded, we derive from Lemma 4 that 

(27) PQL(U n (at)) - L(g(a))\ > x) < A(nxY~ 9 

for all x G [rT 1 / 3 , A(l) _ ^(0)] and large enough n. With a modification of 
AT, this inequality holds for all x > and n £ N. As a probability is no more 
than one, integrating this inequality yields 



(28) 



su P n(nV 3 \L(U n (at))-L(g(a))\) q ']<K, 



provided q' < 3(q — 1). Recall (21). Then Lemma 6(i) with, for example, 
r = r' = 2 combined with Holder's inequality and the change of variable 
a — > proves that 



p/3 



A(0) 
A(l) 



L{U n {a)) - L{g{a)) 



L'{g(a)) 



'/{a^-Pda + o^n- 1 ^) 



\L(U n (at))-L(g(at))\*> 



W(a)\ 



i-p 



where 



Let a G 



J-n ~ 

By (8), 



[A(l) +n~ 1 / 6 /logn,A(0) - n~ l / % / \ogn\. 



da + o P (n 1/6 ), 



L(U n (a,t))= argmax {(A+ o A" 1 - a^ 1 )^)}. 

«e[L(0),L(l)] 
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The location of the maximum of a process {Z(u), u £ 1} is also the location 
of the maximum of {AZ(u) + B, u € 1} for any A > and -B € M. Therefore, 

n 1/3 (L(t);„(a 5 )) - L(g{a))) = argmax{L> n (a,u) + W g{a) (u) + i? n (a,u)}, 

where W g t a \ is given by (25), 

I n (a) = [—n 1 ^ 3 (L(g(a)) - 1(0)) ^(1,(1) - L( 5 (a)))], 
L>„(a,u) = n 2 / 3 (AoL~ 1 -aL~ 1 )(L( 5 (a)) +n~ 1 / 3 u) -n 2 / 3 (A( 5 (a)) -a 5 (a)) 
and R n (a,u) is equal to 

n 2/3 (a-a^)(L^ 1 (L(c/(a)) + n~ 1/3 u) -g(a)) - n~ 1/6 £„u + #„(<!, u) 
for some R n which satisfies 

sup |^(o,«)| <n 2/3 sup |A+(t) - A(t) - n~ 1/2 B n o L(t)\. 
aeffi,ue7 n (a) *e[o,i] 

We will use Lemma 5 to show that R n is negligible. For this task, we need 
to localize. Let T n = n V(3(6g-n)) and 

U n {a) = argmax {D n (a,u) + W g ( a )(u) +R n (a,u)}. 

u£[-T n ,T n ] 

If n is large enough, then [—T n ,T n ] C I n (a) for all a£ J n , so 

n 1 /3(L([/ n (^))-L( ff (a))) 

can differ from U n (a) only if its absolute value exceeds T n . It thus follows 
from (27) and (28) that we can apply Lemma 6(ii) with some r' < 3(q — l)/p, 
r' as close as possible to 2>{q — l)/p. We get 



Now let 



L>„(a) + n 1 /3( L ( 5 ( a)) _ L ( 5 ( G « ))) | P ^ 



da + op^" 1 / 6 ) 



2d( 5 (a)) 



{L'{g{a))Y 



da + o P (n 1/6 ). 



J7„(a) = argmax {D n (a,u) + W g ^(u)}. 

u£[— log n, log n] 

By Taylor's expansion, there are positive K and c with 
9 



—D n (a,u) 



< K\u\ and D n (a,u) < — cu 2 

for every a € J n and u € [— T n ,T n ]. Moreover, there exists K > with 

|# n (a,u)| ^J^n-^l&l + n 2 / 3 sup |A n (t) - A(t) - n" 1/2 B ?t o L(t)|, 

te[o,l] 
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since A n is cadlag. By (A4), (15) thus holds with R n (u) replaced by R n (a,u). 

Due to Theorem 4 in [4], U n (a) has bounded moments of any order, so we 
can apply Lemmas 5 and 6 both with condition (i) to get 



Oi n 



J n 



UJa) 



n 



-1/6. 



2d(g{a)) 



W(a)\ 



i-p 



(L'(g(a)))P 



da + o ¥ (n 1/6 ) 



Now we approximate U n {a) by V{g(a)). By Taylor's expansion and (4), 
there exists K such that, for all \u\ < logn, 

\D n (a,u) - d(g{a))u 2 \ < Kn- s/3 (lognf. 

It follows from (17) that V(t) has bounded moments of any order so Lemma 
5(h) and Lemma 6(i) show that 



On 



V{jg{a))-nr x l«- 



da + op(n 



2d(g(a)) 

and (26) follows from the change of variable t = g(a). 

• Step 3. Now we prove that, although B n could be a Brownian bridge 
in (A4), everything works as if it were a Brownian motion. This is similar 
to Corollary 3.3 in [8] and Lemma 2.2 in [11], but the present argument 
takes a simpler form since we deal with V . Precisely, we show that £ n can 
be removed from (26), that is, 



(29) 



On 



\v$)\ l 



A'(i) 



L'{t) 



dt + o P (n- 1/6 ). 



This is precisely (26) if B n is a Brownian motion since, in that case, £ n = 0. 
Hence, we assume here that B n is a Brownian bridge. Therefore, £ n is a 
standard Gaussian variable. Let 



T> r , 



n 



1/6 



X'(t) 



L'{t) 



dt 



V(t) 



n 



-1/6, 



2d(t) 



\'(t) 



L'(t) 



dt 



We will show that V n = op(l). Hereafter, for every t, V{t) denotes the loca- 
tion of the maximum of the process u i— > —d(t)u 2 + Wt{u) over 1KL Then for 
every t, V(t) can differ from V(t) only if |I^(t)| > logn, so similar to (17), 

(30) F(V{t) ^ V(t)) < 2exp(-c 2 (logn) 3 ). 

Moreover, 

d{tf^V{t) = argmaxl-u 2 ^*)- 1 / 3 + W t {ud{t)- 2 ^)}, 



which, by scaling, is distributed as A"(0); see (3). Fix 7 G (0,1/12). Corol- 
laries 3.4 and 3.3 in [7] show that X(0) has a bounded density function, so 
from (30), 

F(\V(t)\ <n~ 7 ) < KrT^i. 
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Here, K does not depend on t since d is bounded. Moreover, £ n and V(t) 
possess uniformly bounded moments of any order and the probability that 
|£ n | exceeds logn is less than exp(— (logn) 2 /2). Expanding x h- > x p around 
\V(t)\ then proves that V n is asymptotically equivalent to 

-1/6 £n 



pn 



1/6 



2d(t) 



p-i 



A'(t) 



L'(t) 



where A n (t) is the intersection of the events {|V"(t)| > n 7 } and {|£ n | < 
logn}. Hence, 



£>n = Pin 



A'(t) 



/o 2d(*)' v " L'(t) 
Now, V"(i) has a symmetric distribution, so 



dt + op(l) 



E 



A'(i) 



L'(i) 



dt 



var 



1 M 
o 2d(i) 



\V{t)\ 



p-2 



X'(t) 



L'(t) 



dt 



and one can prove, arguing as in Step 5 below, that this tends to zero as 
n — > oo. Thus, the above integral converges to zero in probability. As £ n is 
stochastically bounded, we get V n = op(l). 

• Step 4. Now, we prove that it is sufficient to show 



1/6 



J Y n (t)dt-*J\f(0,o%) in distribution, 



where 



Y n (t) = (\V(t)f-E\V(tW 



\'(t) 



L'(t) 



We have seen that d(t) 2 / 3 V(t) is distributed as X(0), so (30) implies 



E\V(t)f 



Thus, by (29), 



\>(t) 



L'(t) 



dt = E\x(o)\ p / d(ty 2p / 3 

Jo 

= m p + o(n~ 1/6 ). 



\>(t) 



L'{t) 



dt + o(ri 



-i/6i 



n l/6(J n _ mp ) =n l/6 

which proves the stated result. 
• Step 5. In this step we show 



y„(t)dt + op(i), 



(31) 



lirn^ var (V /6 ^ Y„ (t) dt) = a. 
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Let v n = vax(j} Y n (t) dt). From Fubini's theorem, 
- 1 ri A'(i) A'(s) P 







L'{t) L'{s) 



cov (\V{t)\ p ,\V(s)\ p )dtds. 



Let c n = 2ra~ 1 / 3 logn/ inft L'(t). The increments of W n are independent, so 
and V"(s) are independent for all \t — s\ > c n . Moreover, \V(t)\ possesses 
bounded moments of any order, so 

2p 



(32) v n 



rl rmm(l,s+c n ) 


\'(s) 


JO Js 


L>(s) 



cov(\V(t)\ p , \V(s)\ p ) dt ds + o(n~ 1/3 ) 



For every s and t, let Vt(s) be the location of the maximum of the process 
u i— » —d(s)u 2 + Wt(u) over [— logn, logn] and let Vt(s) be the location of the 
maximum of this process over the whole real line. By (17), Vt(s) and Vt(s) 
have bounded moments of any order. Holder's inequality combined with (20) 
thus yields 

\cov(\V t (t)\ p , \V s (s)\ p ) - cov(\V t (s)\ p , \V s (s)\ p )\ < K¥}/ r \V t {t) - V t (s)\ r , 
where r > 1 is arbitrary. Since Vt(t) = V(t), Lemma 5(h) yields 

rl r min(l,s+c n ) X(s) 2p i /q 

v n = 2 / -±-L cov ( V t (s) p , V s (s^dtds + oin-V 3 . 

Jo L'{s) 

For every fixed s, Vt{s) can differ from Vt(s) only if |Vt(s)| > logn, so similar 
to (17), we get 

F(V t (s) + V t (s)) < 2exp(-c 2 (logn) 3 ). 

Thus, Vt(s) and V s (s) can be replaced by Vt(s) and V s (s) in the above 
integral. Now, fix s and t in [0, 1] and let X be given by (3), where 

W(u) = n 1 / 6 d(s) 1 / 3 (VF n (L(s) + rT l l 3 d(s)- 2 l 3 u) - W n {L{s))). 

Then 

d{s) 2/3 V t (s) = X(n 1/3 d{s) 2/3 {L(t) - L(s))) - n l / 3 d(s) 2 / 3 (L{t) - L{s)). 

The change of variable a = n 1 / 3 d{s) 2 / 3 {L{t) — L(s)) and straightforward 
computations then yield (31). 

• Step 6. It remains to prove asymptotic normality of n 1//6 Jq 1 Y n (t) dt. 
We will use Bernstein's method of big blocks and small blocks, as in [8] 
and [11]. Let L n = n~ 1 / 3 (logn) 5 , L' n = n~ 1 / 3 (logn) 2 and denote by N n the 
integer part of (L n + L' n )~ l . Let ao = 0, a27V„+i = 1 and for all n € N and all 
j £ {0, . . . , N n — 1}, let a 2 j+i = a 2 j + L n and a 2j+2 = a 2 j+i + L' n . Finally, let 
Cn,j = n 1/6 C/ +1 Y n{t) dt. By definition, EY n (t) = 0, so 



cov(y n (t),y n ,(s))dtds. 

'02i+l 
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By independence, the terms with i ^ j are equal to zero for large enough 
n, so the above expectation is of order o(n~ 1 / 3 ). Hence, n 1 / 6 Jq 1 Y n (t) dt is 
asymptotically equivalent to J2jCn,j, an d by Step 5, vax(J2j €n,j) tends to 
<7p as n — > oo. By Holder's and Markov's inequalities, we have, for all 5 > 0, 



E E (^i| ? „, J i> 5 )<E IE (i^i 3 )^ 1 - 

3=0 j=0 

This tends to zero as n — > oo, so the central limit theorem with the Lindeberg 
condition shows that ■ £ nj - tends to a centered Gaussian distribution with 
variance a 2 . By Step 4, this completes the proof of the theorem. □ 

6. Proof of the results of Section 3. Here again, K, K' , c, denote positive 
numbers that do not depend on n and may change from line to line. 



6.1. Proof of Theorem 3. (i) Let M* be the stopped process 

M* (t) = M n (t A X {n) ) = A n (t) - A(t A X (n) ) , tG [0,1], 
where -X"( n ) = maxj . We have < 1 with probability 7™ , where 



(33) 



7 = l-hm(l-F(t))(l-G(t))<l. 

1 1 1 



Recall (a + 6) 2 < 2a 2 + 2b 2 for all real numbers a and 6. As M* is identical 
to M n if > 1 , we get 



E 



sup (M n (u) - M n (t)Y 

t<u<t+x 



<2E 



sup (M*(u)-M*(t)Y 

t<u<t+x 



2.,n 



+ 2(Kx) 2 7 



for every i € [0, 1] and x > 0. Here, denotes the supremum norm of A. By 
Theorem 7.5.2 in [16], M* is a square integrable mean zero martingale with 
predictable variation process 

ago 



(34) 



where H n _{s) = n 1 J^i ljfj<s- By Doob's inequality, 



E 



sup (M»-K(*)r 

t<U<t+£ 



<4E[(M n *(lA(t + x))-M:(t)) 2 ] 

= 4E[(M n *(lA(t + x))) 2 -(M:(t)) 2 
4 



-E 



lA(t+x) A (s) 

l-i? n _( S ) ls ^») dS . 
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Let N be the number of AYs that are greater than or equal to 1. For every 
s < 1 A Xf n \, n(l — H n -(s)) is greater than or equal to 1 V N. Hence, by 
monotonicity, 

Kx 



E 



sup {M* n (u)-M* n {t)Y 

t<u<t+x 



<4xA(0)E( - 



1 



ViV 



< 



n 



since N has a binomial distribution with parameter n and probability of 
success 1 — 7 > 0. Also, 7™ < K/n for some K > 0, and x 2 < x for all x G [0, 1]. 
Hence, for every t G [0, 1] and x > 0, we have 



(35) 



E 



sup (M n (u) - M n (t)Y 

t<u<t+x 



< 



Kx 



n 



To handle the case u <t, we derive from (35) that 

v 2 



E 



sup (M n («) - M n (t)Y 

t—x<u<t 



<W[(M n (t)-M n ((t-x)V0)) 
+ 2E 



sup (M n (u)-M n ((t-x)V0)Y 

t—x<u<t 



< 



Kx 



n 



for every t G [0, 1] and x > 0. Combining this with (35) yields (A2) and (A2'). 
Now, A n jumps only at times t{ when we observe uncensored data. Hence, 
for every 6 > 0, the probability that A n jumps in (0,5/n) or in (1 — 5/n, 1) 
is no more than 

nP(Ti G (0, 5/n) U (1 - 5/n, 1)). 

This is no more than 2K5, where K is the supremum norm of / on [0, 1], so 
(A3) follows from Lemma 1. 

(ii) Let L be defined by (7), and denote the supremum distance on [0, 1] 
by || . || . We will prove that there exist versions of M n and the standard 
Brownian motion B n such that, for all x G [0,n], 



(36) P 



n sup \M n (t) -n~ l/2 B n oL(t) \ >x + K\og 
te[0,i] 



n 



< K' exp(-cx), 



where A, K' and c depend only on F and G. This indeed suffices to prove 
(A4). We consider the limit-product estimator F n of Kaplan and Meier, 



^„(t)=i-n 



i<k 



11: 



1 



11; 



t>0, 
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and we set A n = — log(l — F n ). By Corollaries 1 and 2 of [12], there are 
versions of F n and B n such that 

(37) P[n||F n — F — n- x l 2 {\ - F)B n o L\\ >x + Klogn] < K' exp(-crc) 

for all x > 0. Here, K, K' and c depend only on F and G. As L is bounded 
on [0, 1] , we have 

m\B n o L\\ >x}< exp(-cV) 

for some d > and all x > 0. But F(l) < 1 and we have (37), so we can 
assume without loss of generality that F n (l) < 1 and, therefore, A n is well 
defined on the whole interval [0,1]. As A = — log(l — F), expanding u i— > 
exp(— u) proves that there are positive c, K and K' , which depend only on 
F and G, such that 

P[n||(A re -A) exp(-A) - n~ 1,2 {\ - F)B n o L\\ > x + Klogn] < K'e~ cx , 

for all x G [0, n\ . Hence, 

P[n||A n -A-n" 1/2 £ n oL|| >x + i^logn] < K'exp(-cx), 

and it remains to show that A n is close enough to A n . By Taylor's expansion, 
one has, for all i with ti £ [0, 1], 

In 

< AJti) - AJU) < V -; \n < —, ^t, 

- n\u ny^| -j^.2{n j -l) 2 -2{NVl-l) 2, 

where we recall that N is the number of Aj's that are greater than or equal 
to 1. Both A n and A n are constant on the intervals [tj,tj + i). As N is a 
binomial variable with parameter n and probability of success 1 — 7 [see 
(33)], one can then derive from Hoeffding's inequality that 

P[n||A n — A n || > x] < K exp(— cn) < K exp(-cx), 

for some K > 0, c> and all x G (K',n\. The result follows. 

6.2. Proof of Theorem 4. Fix i G [0, 1] and x > 0. As A n — A is a mar- 
tingale, Doob's inequality yields 



(38) E 



sup {M n (u) - M n (t)) 2 

t<u<t+x 



<4M({M n {lf\{t + x))-M n {t)) 2 ) 



But n(A n (l A(t + x)) — A n (t)) has a Poisson distribution with expectation 
ra(A(l A (t + cc)) — A(t)). Thus, its variance is bounded by Knx, where K is 
the supremum norm of A on [0, 1], and (35) holds for all x > and t G [0, 1]. 
We can handle the case u < t as in the proof of Theorem 3, whence (A2) 
and (A2'). Now, A n , can jump in (1 — 5/n,l) only if at least a process Ni 
jumps in this interval. But the jumps of Ni have height 1, so for every 5 > 0, 

P(A n jumps in (1 - S/n, 1)) < nP(Aq(l) - Ni(l - S/n) > 1). 



MONOTONICITY CONSTRAINED ESTIMATORS 



23 



The variable Ni(l) — Ni(l — 5/n) has a Poisson distribution with expectation 
A(l) — A(l — S/n), so by Markov's inequality, 

P(A n jumps in (1 - S/n, 1)) < KS. 

We can proceed likewise to control the probability that A n jumps in (0,5/n), 
so (A3) follows from Lemma 1. It remains to prove (A4). For this task, fix 
q > 2 and for every k = 0, . . . , n, let t k = k/n. We have 



(39) 



E\M n (t k ) - M n (i fc _i)|« < Kn~ q 



for all k > 1 and some K > 0. The increments of M n are independent, so 
by Theorem 5 in [15], there are versions of M n and the standard Brownian 
motion B„ such that 



E 



max \M n {t k ) - n~ l / 2 B n {A{t k ))\ q 

Kk<n 



< En 1 "" 



for some K > 0. One then obtains, using (39), monotonicity of A n and prop- 
erties of Brownian motion, that there is a K > such that 



E 



sup \M n {t)-n~ l l 2 B n {k{t))\ L < 
■te[o,i] 



< Kn l ~ q . 



This holds for any q>2, hence, in particular, for some q > 12. Thus, from 
Markov's inequality, (A4) holds with L = A. □ 

6.3. Proof of Theorem 5. (i) We have (u + v) 2 < 2{u 2 + v 2 ) for all real 
numbers u and v. Hence, for all t € [0, 1] and x > 0, 



E 



sup (M„(u) - M n {t)Y 

\u—t\<x 







< ^E 


sup 


n z 


|«|<a: \ 



Vi<n(t+u) i<nt J 



+ 



A" 



for some K > 0, which depends only on A. By Doob's inequality, this is no 
more than 



\nt<i<n(t+a;) / \n(t— x)<i<nt 



A" A"'x 

+ ^7< 



n- 



for all x > 1/n, whence (A2). By definition, A n jumps at times i/n, i 
1, . . . ,n. If t £ {0, 1}, we thus have for every x G (0, 1/n) that 

sup \M n (u) - M n (t)\ = sup \A(u)-A(t)\<Kx, 

\t—u\<x \t—u\<x 



whence (A2'). Moreover, it is clear from Lemma 1 that (A3) holds. 
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(ii) From Theorem 5 in [15], there exist versions of (£i, n ) and the standard 
Brownian motion B n such that 



E 



sup 

te[o,i] 



-£^-n-vWlj> 2 (z/n) 



n ' 

%<nt 



i<nt 



Thanks to Markov's inequality, one can then derive (A4) from properties of 
B n and the regularity assumptions on a 2 . 



6.4. Proof of Theorem 6. Fix t G [0,1], x > 0, and define 

A n {u) - A n (t) 



M n {u) 



A(u)-A(t) ' 



ug[0,1], 



where we recall that A n is the empirical distribution function of the sample 
Xi, . . . ,X n . By Lemma 2.2 in [8], the process {A4 n (u), u G (t, 1]} is a re- 
verse time martingale conditionally on A n (t). Since A is increasing and A is 
bounded, Doob's inequality yields 



E 



sup (M n (u) - M n (t)Y 

x/2<u—t<x 



<Kx 2 E 



M n (t + x/2)-M n (t) 
A(t + x/2)-A(t) 



But n(A n (t + x/2) — A n (t)) is a binomial variable with parameter n and 
probability of success A(t + x/2) — A(t). Moreover, A is bounded away from 
zero, whence 



E 



sup (M n (u) - M n (t)Y 

x/2<u—t<x 



< 



Kx 



for all x > and t G [0, 1]. To handle the case u < t, we use the fact that the 
process {Ai n {u), u G [0, t)} is a forward time martingale conditionally on 
A n (t) (see Lemma 2.2 in [8]). Whence, (A2) and (A2'). Now, A n jumps at 
times Xi,... } X n . As A is bounded, the probability that A n jumps in (0,5/n) 
or in (1 — 5/n, 1) is no more than 

nP(X 1 G (0, 5/n) U (1 - 5/n, 1)) 

for every 5 > 0. This is no more than 2 K 5, where K is the supremum norm 
of A, so (A3) follows from Lemma 1. Finally, it follows from the Hungarian 
embedding of [10] that (A4) holds with L = A and B n a Brownian bridge. 
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